Using SiteRank for Decentralized Computation of Web Document Ranking

نویسندگان

  • Jie Wu
  • Karl Aberer
چکیده

The PageRank algorithm demonstrates the significance of the computation of document ranking of general importance or authority in Web information retrieval. However, doing a PageRank computation for the whole Web graph is both time-consuming and costly. State of the art Web crawler based search engines also suffer from the latency in retrieving a complete Web graph for the computation of PageRank. We look into the problem of computing PageRank in a decentralized and timely fashion by making use of SiteRank and aggregating rankings from multiple sites. A SiteRank is basically the ranking generated by applying the classical PageRank algorithm to the graph of Web sites, i.e., the Web graph at the granularity of Web sites instead of Web pages. Our empirical results show that SiteRank also follows a power-law distribution. Our experimental results demonstrate that the decomposition of global Web document ranking computation by making use of SiteRank is a very promising approach for computing global document rankings in a decentralized Web search system. In particular, by sharing SiteRank among member servers, such a search system also obtains a new means to fight link spamming.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using SiteRank for P2P Web Retrieval

Studies of the Web graph at the granularity of documents have revealed many interesting link distributions. Similarly, studies of the Web graph at the granularity of Web sites, the so-called hostgraph, revealed relationships among hosts based on linkage and co-citation. However, to the best of our knowledge, the graph of Web sites has not been exploited for the purpose of ranking in search engi...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

A Framework for Decentralized Ranking in Web Information Retrieval

Search engines are among the most important applications or services on the web. Most existing successful search engines use global ranking algorithms to generate the ranking of documents crawled in their databases. However, global ranking of documents has two potential problems: high computation cost and potentially poor rankings. Both of the problems are related to the centralized computation...

متن کامل

Using a Layered Markov Model for Decentralized Web Ranking

The link structure of the Web graph is used in algorithms such as Kleinberg’s HITS and Google’s PageRank to assign authoritative weights to Web pages and thus rank them. In HITS, a solid theoretical model is lacking and the algorithm often leads to non-unique or non-intuitive rankings where zero weights may inappropriately be assigned to parts of a network. In PageRank, a model of random walks ...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004